Text normalization について

Words near each other

・ Text Encoding Initiative
・ Text entry interface
・ Text Executive Programming Language
・ Text figures
・ Text file
・ Text format
・ Text graph
・ Text inferencing
・ Text linguistics
・ Text Me Merry Christmas
・ Text Me TV
・ Text messaging
・ Text mining
・ Text mode
・ Text Mode Demo Contest
・ Text normalization
・ Text Object Model
・ Text over IP
・ Text parser
・ Text processing
・ Text publication society
・ Text Publishers
・ Text Publishing
・ Text Records
・ Text replacement
・ Text Retrieval Conference
・ Text roulette
・ Text Santa
・ Text segmentation
・ Text Services Framework

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Text normalization ：ウィキペディア英語版

Text normalization

Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it. Text normalization requires being aware of what type of text is to be normalized and how it is to be processed afterwards; there is no all-purpose normalization procedure.
== Applications==

Text normalization is frequently used when converting text to speech. Numbers, dates, acronyms, and abbreviations are non-standard "words" that need to be pronounced differently depending on context.〔Sproat, R.; Black, A.; Chen, S.; Kumar, S.; Ostendorfk, M.; Richards, C. (2001). "Normalization of non-standard words." ''Computer Speech and Language'' 15; 287–333. doi:(10.1006/csla.2001.0169 ).〕 For example:
* "$200" would be pronounced as "two hundred dollars" in English, but as "lua selau tālā" in Samoan.〔(【引用サイトリンク】 work = MyLanguages.org )〕
* "vi" could be pronounced as "vie," "vee," or "the sixth" depending on the surrounding words.〔(【引用サイトリンク】 work = MSDN )〕
Text can also be normalized for storing and searching in a database. For instance, if a search for "resume" is to match the word "résumé," then the text would be normalized by removing diacritical marks; and if "john" is to match "John", the text would be converted to a single case. To prepare text for searching, it might also be stemmed (e.g. converting "flew" and "flying" both into "fly"), canonicalized (e.g. consistently using American or British English spelling), or have stop words removed.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Text normalization」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース